Two Dimensional Data Worksheet

This worksheet focuses on manipulating two dimensional data using Python and Pandas.



In [1]:

    
%pylab inline
import pandas as pd
import numpy as np
pd.options.mode.chained_assignment = None









    



Populating the interactive namespace from numpy and matplotlib



In [4]:

    
#Create a dataframe called twitter data from the CSV file
#Note if this is breaking your machine there is a smaller data set in the data file called twitter1-small.csv
twitterData = pd.read_csv( '../../data/twitter1.csv', encoding='iso8859_15' )

Exercise 1

Using the twitterData DataFrame and the commands we have learned thus far and create a Series called tweetCounts which contains the user name and how many tweets each user posted. Next, output the top 10 "tweeters".



In [5]:

    
tweetCounts = twitterData['Username'].value_counts()
tweetCounts.head(10)









    Out[5]:





HoolohaTube        155
Rasu24             150
HOOLOHASPORT       126
mahboobali3        119
EminemsRealWife    116
byezekiel           89
MyrtleMuelr         83
LucindaFischer      79
DebraRichayd        77
JeanieNoble         70
Name: Username, dtype: int64

Exercise 2

Using the original twitter data set, create a second DataFrame called twitterSummary which contains the following columns:

Username
Friends
Followers

Next add a column called ffratio which contains the ratio of friends to followers.



In [6]:

    
twitterSummary = twitterData[['Username', 'Friends', 'Followers']]
twitterSummary['ffratio'] = twitterSummary['Friends'] / twitterSummary['Followers']

twitterSummary.head()









    Out[6]:






  
    
      
      Username
      Friends
      Followers
      ffratio
    
  
  
    
      0
      _prettybrown
      1042
      1538
      0.677503
    
    
      1
      CarlyManning24
      278
      304
      0.914474
    
    
      2
      madzLuvzLakers
      619
      1039
      0.595765
    
    
      3
      _AyyJayy
      203
      204
      0.995098
    
    
      4
      Akeemoneale
      165
      27
      6.111111

Exercise 3

In the Data folder, there is a spreadsheet called studentData.csv consisting of students and test scores. Write a script which calculates each students' average test score and adds that as a column to the DataFrame. The first person to raise their hand and tell me which student has the highest average test score, and what it is wins something.



In [7]:

    
studentData = pd.read_csv('../../data/studentData.csv')

studentData['average'] = studentData[['Test1', 'Test2', 'Test3', 'Test4', 'Test5']].mean(axis=1)

studentData.sort_values('average', axis=0, ascending=False )

Exercise 4

Using the twitter data, find all the users with Facebook accounts and create a new column called FacebookID which contains the users' Facebook ID. As you can see in the URL below, a user's Facebook ID can be found in the URL column, http://www.facebook.com/profile.php?id=5141860. Extract this by using the str.extract function. Don't forget to remove all the invalid or empty IDs.

We've already created a DataFrame for you in the cell above.



In [8]:

    
newData = twitterData[ twitterData['URL'].fillna("").str.contains('facebook') ]
newData['FacebookID'] = newData['URL'].str.extract( 'profile.php\?id=(\d+)', expand=False)
newData.dropna( inplace=True )



In [10]:

    
newData.head()









    Out[10]:






  
    
      
      Primary Key
      Service
      Term
      Username
      Name
      Update
      Location
      URL
      Friends
      Followers
      Time(PDT)
      City
      State/Region
      Country
      Metro
      Latitude
      Longitude
      FacebookID
    
  
  
    
      14
      15
      twitter
      lakers
      MrBAAD
      Tashaun Williams
      @goodyCHOOshoes haha im sorry for you then... ...
      Miami
      http://www.facebook.com/profile.php?id=5141860...
      187
      143
      6/3/2010 17:00
      Miami
      FL
      US
      Miami-Fort Lauderdale-Pompano Beach FL
      25.604410
      -80.335216
      514186015
    
    
      66
      67
      twitter
      lakers,celtics
      HoneyHoward
      Jasmine Howard
      bout to cook dinner&&split this wig before the...
      Washington, D.C.
      http://www.facebook.com/home.php#/profile.php?...
      349
      1155
      6/3/2010 17:00
      Washington
      DC
      US
      Washington-Arlington-Alexandria DC-VA-MD-WV
      38.950224
      -77.019714
      1707568551
    
    
      84
      85
      twitter
      lakers
      Est_June3rd
      Chuck K
      hmmm im seein alot of new lakers fans on ma ti...
      Pontiac,MI
      http://www.facebook.com/profile.php?id=1060743...
      1397
      1606
      6/3/2010 17:00
      Pontiac
      MI
      US
      Detroit-Warren-Livonia MI
      42.668599
      -83.290343
      1060743307
    
    
      155
      156
      twitter
      celtics,lakers
      NGz_Swift
      Yung Crush
      Celtics finish smash the Lakers so I guess som...
      Los Angeles,CA
      http://www.facebook.com/#!/profile.php?id=1701...
      238
      574
      6/3/2010 17:00
      Los Angeles
      CA
      US
      Los Angeles-Long Beach-Santa Ana CA
      34.009842
      -118.258642
      1701903898
    
    
      638
      639
      twitter
      lakers
      GoodLookTy_BFA
      Tyquan Moore
      @Relly718 19 to 1 Lakers lol
      Brooklyn, New York
      http://www.facebook.com/profile.php?id=551167346
      259
      339
      6/3/2010 17:02
      Brooklyn
      NY
      US
      New York-Northern New Jersey-Long Island NY-NJ-PA
      40.645412
      -73.958730
      551167346



In [ ]:

	Student ID	Test1	Test2	Test3	Test4	Test5	average
8	9	0.90	0.85	0.76	0.83	0.99	0.866
6	7	0.88	0.59	0.94	0.92	0.83	0.832
0	1	0.78	0.96	0.82	0.63	0.88	0.814
7	8	0.93	0.70	0.92	0.91	0.56	0.804
2	3	0.66	0.80	0.97	0.77	0.80	0.800
1	2	0.95	0.80	0.70	0.82	0.72	0.798
9	10	0.67	0.84	0.54	0.73	0.89	0.734
3	4	0.73	0.67	0.85	0.68	0.59	0.704
4	5	0.54	0.76	0.65	0.54	0.92	0.682
5	6	0.70	0.70	0.60	0.54	0.63	0.634

	Username	Friends	Followers	ffratio
0	_prettybrown	1042	1538	0.677503
1	CarlyManning24	278	304	0.914474
2	madzLuvzLakers	619	1039	0.595765
3	_AyyJayy	203	204	0.995098
4	Akeemoneale	165	27	6.111111

	Primary Key	Service	Term	Username	Name	Update	Location	URL	Friends	Followers	Time(PDT)	City	State/Region	Country	Metro	Latitude	Longitude	FacebookID
14	15	twitter	lakers	MrBAAD	Tashaun Williams	@goodyCHOOshoes haha im sorry for you then... ...	Miami	http://www.facebook.com/profile.php?id=5141860...	187	143	6/3/2010 17:00	Miami	FL	US	Miami-Fort Lauderdale-Pompano Beach FL	25.604410	-80.335216	514186015
66	67	twitter	lakers,celtics	HoneyHoward	Jasmine Howard	bout to cook dinner&&split this wig before the...	Washington, D.C.	http://www.facebook.com/home.php#/profile.php?...	349	1155	6/3/2010 17:00	Washington	DC	US	Washington-Arlington-Alexandria DC-VA-MD-WV	38.950224	-77.019714	1707568551
84	85	twitter	lakers	Est_June3rd	Chuck K	hmmm im seein alot of new lakers fans on ma ti...	Pontiac,MI	http://www.facebook.com/profile.php?id=1060743...	1397	1606	6/3/2010 17:00	Pontiac	MI	US	Detroit-Warren-Livonia MI	42.668599	-83.290343	1060743307
155	156	twitter	celtics,lakers	NGz_Swift	Yung Crush	Celtics finish smash the Lakers so I guess som...	Los Angeles,CA	http://www.facebook.com/#!/profile.php?id=1701...	238	574	6/3/2010 17:00	Los Angeles	CA	US	Los Angeles-Long Beach-Santa Ana CA	34.009842	-118.258642	1701903898
638	639	twitter	lakers	GoodLookTy_BFA	Tyquan Moore	@Relly718 19 to 1 Lakers lol	Brooklyn, New York	http://www.facebook.com/profile.php?id=551167346	259	339	6/3/2010 17:02	Brooklyn	NY	US	New York-Northern New Jersey-Long Island NY-NJ-PA	40.645412	-73.958730	551167346